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ABSTRACT 


Bayesian  linear  regression  models,  where  the 
parameters  follow  simple  trends,  can  be  efficiently 
solved  using  credibility  approximations  and  recur- 
sive calculations  which  exploit  the  special  struc- 
ture of  the  problem. 
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CREDIBILITY  REGRESSION  WITH  SIMPLE  TRENDS 


by 


William  S.  Jewell 


1.  BAYESIAN  REGRESSION 

Consider  a linear  (regression)  model 

(1.1)  y * X?+  u , 

where  "y  and  TT  are  n * 1 random  vectors  of  observable  output  variables 
and  unobservable  error  variables , respectively,  X is  a known  n * k de- 
sign matrix,  and  IT  is  k * 1 random  vector  of  unknown  regression  parame- 
ters. We  assume  that  a prior  Joint  density  of  (3Vu)  is  known.  Given  an 
observation  y ° y , the  problem  is  to  draw  posterior-to-data  inferences 
about  , or  about  future  values  of  y for  some  different  design  matrix; 
this  is  a problem  in  Bayesian  regresrion  [see,  e.g.  Zellner  (1971)]. 

Many  insurance  rate-making  models  are  of  this  type.  With  one  parameter 
(an  unknown  mean),  and  X = [1,1,  ...,  1]'  , we  have  classical  credibility 
theory;  multi-dimensional  credibility  theory  has  several  such  means,  and  X 
contains  blocks  of  unit  matrices.  In  additive  relativity  premium  models, 
there  are  groups  of  different  factors,  with  one  parameter  from  each  group 
added  to  make  up  the  total  premium;  X consists  of  echelon  patterns  of  l's 
and  0's  [Grimes  (1971)]. 


^n;'^[’,\“y™i|.'jr.,!'»F|ffJ'^i|t'.'^i'»'iwi'^w«-w;ri'w^»-Trwv'iTOW“»w'r»-  i"-n,y-f - 


Of  particular  interest  in  these  changing  times  are  linear  models  in  which 
the  parameters  are  subject  to  inflation  by  unknown  amounts.  Although  there  is 
no  formal  difficulty  in  including  trends  in  the  Bayesian  framework,  there  are 
important  practical  difficulties,  due  to  the  necessity  of  providing  full-in- 
formation priors,  and  the  resulting  large  dimensionality  of  real  problems. 

This  paper  explores  the  case  where  the  parameters  follow  a simple  but  unknown 
trend,  and  simpi Lfications  are  possible  by  using  credibility  theory  and  an 
iterative  computational  scheme. 


2.  CREDIBILITY  REGRESSION 

A complete  Bayesian  regression  analysis  is  very  difficult,  usually  re- 
quiring restrictive  distributional  assumptions  or  complicated  algebraic 
manipulations;  [see  e.g.  Box  and  Tiao  (1973),  Morales  (1971),  and  Zellner 
(1971).]  However,  recent  work  [Hachemeister  (1974);  Taylor  (1974);  Jewell 
(1975)]  has  shown  that  the  linearized  approach  of  credibility  theory  can  be 
very  useful  for  general  models  like  (1.1),  if  the  goal  is  to  update  only  mean 
values;  preposterior  (average  before-the-observation)  covariances  can  also  be 
determined. 

Let  the  prior  knowledge  of  (8,u)  be  summarized  in  the  mean  vectors: 

(2.1)  E { B } = b ; E{u  | B } = 0 (for  all  T)  ; 
and  the  covariance  matrices: 

(2.2)  !/{?}  = A ; El/{y  | ?}  = l/{u)  = E ; 

of  order  k * k and  n * n , respectively.  We  define  also  alternate-dimension 
versions  of  the  covariances: 

D = X A X ; e - (X'E‘1X)"1  ; 


(2.3) 


V7-T 


which  are  n x n and  k x k , respectively.  Even  if  E is  positive  definite 
(most  applications  have  E diagonal),  e may  not  exist  in  many  linear  models 
of  interest  because  X is  not  of  rank  max(k,n)  . 

Jewell  (1975)  shows  that  there  are  two  versions  of  the  updated  credibility 
forecast  of  the  mean  parameter  values  f(y)  ~ E(6  | y)  . In  the  first  version: 


(2.4) 


f (y)  - (Ik  - zx)b  + Zy  , 


where  1^  is  k k k unit  matrix,  and  Z is  a k x n credibility  matrix 


(2.5) 


Z - AX' (E  + D)  . 


This  clearly  exists  if,  say,  E is  positive  definite,  and  X contains  only 
nonnegative  elements;  an  n x n inversion  is  required,  even  if  E ^ is  known, 
hence  this  form  is  suitable  for  limited-observation  experiments  where  n < k . 
(Parenthetically,  note  that  n refers  to  different  dimensions  of  obser- 


vations, not  the  actual  volume  of  observations;  if  we  have  v^  samples, 


yil,yi2* 


, in  dimension  i , we  can  aggregate,  using  y^  = \ y „/v^  , 


making  appropriate  adjustments  in  E .) 
In  the  second  version,  we  obtain 


(2.6) 


f (y)  * (Ifc  - z)b  + zB(y)  , 


where  B(y)  is  the  classical  (generalized)  2< „ 3t-sq’.ares  estimator  of  8 


(2.7) 


8(y)  * eX'E_1y  ■=  (X'E_1X)"1X'E'1y  , 


and  z Is  a k x k credibility  matrix: 


(2.8) 


z - + eA'1)  - A(Ik  + c"1A)"1c'1  . 


This  matrix  is  analogous  to  the  usual  multidimensional  credibility  matrix  with  "one' 
sample  [Jewell  (1974)]  and  gives  a more  readily  interpreted  mixing  of  prior  mean 
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and  classical  estimator.  Moreover,  in  many  applications  k < n , and  the 
second  form  is  (2.8)  shows  that  only  one  k n k inversion  is  required  to 
find  f (y)  , if  E ^ is  known,  which  greatly  reduces  the  computational  la- 
bor. On  the  other  hand,  to  find  0 (y)  explicitly  , we  require  that  e 
exist,  which  leads  to  the  classic  problem  of  "identif lability",  and  the  re- 
quirements that  rank  (X)*  k , and  n >_  k . 

For  example,  in  the  usual  analysis  of  additive  rate  relativities,  one 
adds  extra  constraints  on  the  parameters  so  that  X will  be  of  rank  k , 
and  6(y)  will  exist  [Grices  (1971)].  This  is  not  necessary  in  a Bayesian 
regression,  so  long  as  n is  not  so  large  that  Z or  z is  ill-conditioned. 
Of  course,  knere  may  be  external  reasons,  such  as  economic  equity,  for  using 
only  models  in  which  X has  full  rank;  in  this  case,  one  can  show  that,  for 
"stable"  increasing  designs,  z -*■  1^  as  n -*•  » [Jewell  (1975)]. 

The  preposterior  covariance  of  the  parameter  estimation  error  can  be 
shown  to  be: 

(2.9)  4 - !/{?  - f GO  } - (Ik  - ZX)  A = (Ik  - z) A = (A-1  + c"1)'1  . 

. . ~ -i 

Since  the  preevszon  (inverse  covariance)  in  estimating  8 was  A , prior- 
to-data,  we  see  that,  on  the  average,  a forecast  using  X increases  the  pre- 
cision by  e ^ ; alternately,  $ is  the  "A"  we  expect  to  have,  on  the  aver- 
age, as  our  estimate  goes  from  b to  f . 

Hachemeister  (1974)  and  Taylor  (1974; 1975)  have  both  given  special  versions 
of  (2.6), (2.7).  And  there  are  numerous  non-Bayesian  versions  [Theil  (1963); 

Rao  (1965)].  However,  priority  for  both  forms  belongs  in  the  communications 
theory  literature,  where  generalized  least-squares  methods  have  been  used  for 
linear  (Wiener-Kalman-Bucy)  filter  estimation  problems  for  many  years,  (see, 
for  example.  Sage  and  Melsa  (1971,  pp.  182-4).  further  historical  remarks 
are  in  Jewell  (1975). 
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3.  ITERATIVE  CALCULATIONS 

An  interesting  feature  of  credibility  regression  is  the  possibility  of 
cascading  or  serially  combining  several  experiments  through  recursive  cal- 
culations. Let  X^^,  . . . , Xfc  be  the  design  matrices  for  experiments 
1,2,  ...,  t in  which  If  remains  the  same,  but  vectors  •••»  yt  are 

observed,  with  known  observational  error  covariance  natrice.  , E^,  Efct  , 

for  each  experiment;  the  observational  dimension  nt  may  va  -y  from  experiment 
to  experiment,  but  we  assume  observational  independence  between  experiments, 
i.e.  C{ug;ut  | T)  = 0 (s  j4  t)  for  all  6 . 

One  possibility  for  calculation  is  to  combine  all  experiments  into  a single 
large  model  (1.1),  with  y'  = [y'ry':  • ••]  , X'  = [x£ : X^ : . ..]  , 

E ■ diag(E^;E22»  •••)  i the  dimensions  will  be  n = £ by  k . Even  if  the 

second  version  (2  6)  is  used,  there  is  a fair  amount  of  simultaneous  computa- 
tion to  perform  before  the  single  k * k inversion;  furthermore,  if  the  data- 
gatherlng  is,  in  fact,  sequential  in  time,  then  successive  forecasts  are  more 
and  more  inefficient. 

In  Jewell  (1975),  it  is  shown  that  equivalent  computations  can  bo  performed 
in  the  following  recursive  manner: 


(1)  Initialize  by  defining  b(l)  = b and  A(l)  = A . 

(2)  For  period  t , assume  that  current  prior  moments,  b(t)  and  A(t)  , 

are  given.  Using  these  and  y , , and  from  the  current 

experiment,  compute  an  updated  forecast  of  6*  , call  it  f^(y  ) * 
from  (2.4)  or  (2.6),  and  an  error  covariance  $ from  (2.9). 

(3)  Continue  the  computation  for  period  t + 1 by  setting 


(3.1) 


b (t  + 1)  - ft(yt)  ; A (t  + 1)  = <*£  ; 


and  repeat  Step  2. 
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This  iterative  process  replaces  the  all-at-once  computations  by  a se- 
quence of  smaller  ones,  with  a choice  of  whether  to  use  an  n - or 
k-order  inversion  at  each  stage.  In  fact,  if  all  the  E are  diagonal 
matrices  (each  dimension  of  observation  error  independent)  then  one  could 
Iterate  through  every  row  of  all  I 

It  should  be  emphasized  that  the  final  4>  is  still  a preposteriov 
covariance,  equal  to: 

(3.2)  C1  * A_1  + l e_1(i)  ; e"1(i)  - X'eTJx  ; 

c ±ml  i ix  i 


and  is  not  updated  by  the  y . With  more  specific  distributional  assumptions, 
one  could  in  principle  update  the  covariance  as  wel1 ; however,  for  problems  in 
which  control  of  the  variance  is  appropriate,  one  would  probably  use  different, 
nonstationary  models,  and  different  techniques,  such  as  Box-Jenkins  forecasting, 
or  Wiener-Kalman  filtering. 


4.  SIMPLE  TRENDS 

Suppose  now  we  believe  that  the  regression  parameters  are  subject  to  an 
unknown  linear  trend,  and  that,  in  fact: 


(4.1) 


6 - + t*62  , (t  - 1,2,  ...,  T) 


with  the  model  design  X held  constant.  In  a direct  formulation,  one  would 
use  y'  “ [y£:y':  . ..,  y ] , and  an  nT  x 2k  super-design  matrix 


(4.2) 


— 

X X 
X 2X 


X 


TX 


there  are  now  2k  parameters  rearranged  in  linear  format  with  prior  mean  vec- 
tor b'  ■ [b^sbj]  . and  prior  covariance 


(4.3) 


A11  A12 


A21  A22 


Note  that  is  not  reasonable  to  assume  that  * A'^  is  void,  since  most 


linear  models  of  inflation  are  pi  oortional  in  nature,  i.e.  (4.1)  represents 


(4.4) 


6 - (Ik  + tDBj^  , 


where  r is  a scalar  or  diagonal  matrix  of  unknown  inflation  races. 

In  Section  3,  we  have  shown  how  computation  using  (4.2)  could  be  reduced 
to  a series  of  n x 2k  computations  using  a design  matrix  of  form  [X:tX]  ; 
for  the  rest  of  this  section,  we  further  simplify  computation  using  this  spe- 
cial structure. 

Initialize  by  defining  b^(l)  ■ b^  , and  A^  (1)  * (i  .j  * 1,2)  . 


Then,  if  n k , wt  find  two  formulae  similar  to  (2.4)  for  iteration  t by 
using  yt  . Xt  » E , and  calculating 


(4.5)  Ax(t)  - 411(t)  + tA12(t)  ; A2 (t)  - ^(t)  + ; AQ(t)  = A1(t)  + tA2(t) 


Inverting  one  n x n matrix,  we  find  first 


(4.6) 


"i(t)  “ Ai(t)X'(Ett  + XA0(t)X') 


-1 


(i  - 1,2) 


and  then  update  forecasts  of  8^  and  ^ through: 


(4.7)  bx(t  + 1)  - (Ik  - Z1(t)X)b1(t)  - tZx(t)Xb2(t)  + Z1(t)yt  , 


(4.8)  b2(t  + 1)  - -Z2(t)Xb1(t)  + (Ik  - tZ2(t)X)b2(t)  + Z2(t)yt 


■■  ■ - 


On  the  other  hind,  if  k < n , we  find  e ^(t)  from  (3.2),  invert  one 
k x k matrix,  and  make  the  following  replacements  in  (4.7),  (4.8): 

Z.(t)X  / , \ i (e_1(t) 

(4.9)  1 - 4 (t)(I  + e'i(t)A,(t)]  AJ  . (i  - 1,2) 

zi(t)yt)  1 \k  J / (x'EtJyt 

The  four  components  of  the  preposterior  covariance  are  then  updated  without 
further  inversion  through: 

(4.10)  A±j(t  + 1)  - 4^(0  - Z1(t)XAj'  (i,j  - 1,2) 

Thus,  in  the  simple  trend  case  of  Bayesian  regression,  we  obtain  finally 
an  Iterative  sequence  of  calculations,  with  a single  minimal  inversion  at  each 
step,  and  separate  formulae  for  updating  base  values  and  trends  of  the  unknown 
model  parameters. 
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